Delhi Metro Network Analysis¶

Importing libraries¶

In [1]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt

Importing data from CSV file¶

In [2]:
df = pd.read_csv("Delhi metro.csv")
In [3]:
df.head(10)
Out[3]:
ID (Station ID) Station Names Dist. From First Station(km) Metro Line Opened(Year) Layout Latitude Longitude
0 1 Shaheed Sthal(First Station) 0.0 Red line 08-03-2019 Elevated 28.670611 77.415582
1 2 Hindon River 1.0 Red line 08-03-2019 Elevated 28.878965 77.415483
2 3 Arthala 2.5 Red line 08-03-2019 Elevated 28.676999 77.391892
3 4 Mohan Nagar 3.2 Red line 08-03-2019 Elevated 28.606319 77.106082
4 5 Shyam park 4.5 Red line 08-03-2019 Elevated 28.698807 28.698807
5 6 Major Mohit Sharma 5.7 Red line 08-03-2019 Elevated 28.677611 77.358143
6 7 Raj Bagh 6.9 Red line 08-03-2019 Elevated 28.640860 77.209500
7 8 Shaheed Nagar 8.2 Red line 08-03-2019 Elevated 28.530780 77.212057
8 9 Dilshad Garden 9.4 Red line 04-06-2008 Elevated 28.675920 77.321420
9 10 Jhil Mil 10.3 Red line 04-06-2008 Elevated 28.675790 77.312390
In [4]:
df.tail()
Out[4]:
ID (Station ID) Station Names Dist. From First Station(km) Metro Line Opened(Year) Layout Latitude Longitude
280 2 Shivaji Stadium 1.9 Orange line 23-02-2011 Underground 28.62901 77.21190
281 3 Dhaula Kuan [Conn: Pink] 8.3 Orange line 15-08-2011 Elevated 28.59178 77.16155
282 4 Delhi Aerocity 14.5 Orange line 15-08-2011 Underground 28.54881 77.12092
283 5 IGI Airport 17.9 Orange line 23-02-2011 Underground 28.55693 77.08669
284 6 Dwarka Sector 21 [Conn: Blue] 20.8 Orange line 23-02-2011 Underground 28.55226 77.05828
In [5]:
df.shape
Out[5]:
(285, 8)

Exploratory Data Analysis¶

In [6]:
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 285 entries, 0 to 284
Data columns (total 8 columns):
 #   Column                        Non-Null Count  Dtype  
---  ------                        --------------  -----  
 0   ID (Station ID)               285 non-null    int64  
 1   Station Names                 285 non-null    object 
 2   Dist. From First Station(km)  285 non-null    float64
 3   Metro Line                    285 non-null    object 
 4   Opened(Year)                  285 non-null    object 
 5   Layout                        285 non-null    object 
 6   Latitude                      285 non-null    float64
 7   Longitude                     285 non-null    float64
dtypes: float64(3), int64(1), object(4)
memory usage: 17.9+ KB
In [7]:
df.describe()
Out[7]:
ID (Station ID) Dist. From First Station(km) Latitude Longitude
count 285.000000 285.000000 285.000000 285.000000
mean 16.214035 19.218947 28.595428 77.029315
std 11.461808 14.002862 0.091316 2.875400
min 1.000000 0.000000 27.920862 28.698807
25% 6.000000 7.300000 28.545828 77.107130
50% 14.000000 17.400000 28.613453 77.207220
75% 24.000000 28.800000 28.666360 77.281165
max 49.000000 52.700000 28.878965 77.554479
In [8]:
# Check for missing values
df.isnull().sum()
Out[8]:
ID (Station ID)                 0
Station Names                   0
Dist. From First Station(km)    0
Metro Line                      0
Opened(Year)                    0
Layout                          0
Latitude                        0
Longitude                       0
dtype: int64
In [9]:
# Drop rows with any null values
df_cleaned = df.dropna()
In [10]:
df_cleaned.isnull().sum()
Out[10]:
ID (Station ID)                 0
Station Names                   0
Dist. From First Station(km)    0
Metro Line                      0
Opened(Year)                    0
Layout                          0
Latitude                        0
Longitude                       0
dtype: int64

Import Libraries¶

In [11]:
from seaborn import pairplot
import seaborn as sns
import folium
import matplotlib.pyplot as plt
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import plotly.io as pio
pio.templates.default = "plotly_white"
%matplotlib inline
In [12]:
# Print unique values in the 'Metro Line' column
print(df_cleaned['Metro Line'].unique())
['Red line' 'Yellow line' 'Blue line' 'Blue line branch'
 'Green line branch' 'Green line' 'Rapid Metro' 'Voilet line'
 'Magenta line' 'Pink line' 'Aqua line' 'Gray line' 'Orange line']

Pie Plot for Metro Line¶

In [13]:
import pandas as pd
import matplotlib.pyplot as plt

# Create a dictionary to map Metro Line names to colors
line_colors = {
    'Red line': 'red',
    'Yellow line': 'yellow',
    'Blue line': 'blue',
    'Blue line branch': 'navy',
    'Green line branch': 'forestgreen',
    'Green line': 'green',
    'Rapid Metro': 'silver',
    'Voilet line': 'purple',
    'Magenta line': 'magenta',
    'Pink line': 'pink',
    'Aqua line': 'skyblue',
    'Gray line': 'gray',
    'Orange line': 'orange'
}

# Calculate value counts for each Metro Line
line_counts = df_cleaned['Metro Line'].value_counts()

# Create a pie chart using custom colors
plt.pie(line_counts, labels=line_counts.index, autopct='%1.1f%%', colors=[line_colors[line] for line in line_counts.index])

# Set aspect ratio to be equal, so the pie is drawn as a circle.
plt.axis('equal')

# Add a title
plt.title('Metro Line Distribution')

# Display the pie chart
plt.show()
No description has been provided for this image

Creating an Interactive Map of Delhi Metro Stations¶

In [14]:
# Create a base map centered on Delhi
delhi_map = folium.Map(location=[28.6139, 77.2090], zoom_start=12)

# Add markers for each metro station
for index, row in df_cleaned.iterrows():
    folium.Marker([row['Latitude'], row['Longitude']], popup=row['Station Names']).add_to(delhi_map)

# Display the map
display(delhi_map)
Make this Notebook Trusted to load map: File -> Trust Notebook

Metro Line Analysis¶

Bar Plot of Station Counts by Metro Line:¶

In [15]:
df['Metro Line'].value_counts().plot(kind='bar')
plt.xlabel('Metro Line')
plt.ylabel('Number of Stations')
plt.title('Number of Stations per Metro Line')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()
No description has been provided for this image

Histogram of Distance from First Station:¶

In [16]:
plt.hist(df_cleaned['Dist. From First Station(km)'], bins=20)
plt.xlabel('Distance from First Station (km)')
plt.ylabel('Frequency')
plt.title('Distribution of Distances from First Station')
plt.tight_layout()
plt.show()
No description has been provided for this image

Line Plot of Yearly Openings:¶

In [17]:
yearly_openings = df.groupby('Opened(Year)').size()
yearly_openings.plot(kind='line', marker='o')
plt.xlabel('Year')
plt.ylabel('Number of Stations Opened')
plt.title('Yearly Openings of Metro Stations')
plt.tight_layout()
plt.show()
No description has been provided for this image

Bar Plot of Layout by Metro Line:¶

In [18]:
# Create a cross-tabulation of Layouts and Metro Lines
layout_by_line = pd.crosstab(df_cleaned['Layout'], df_cleaned['Metro Line'])

# Define line colors according to the pattern
line_colors = {
    'Red line': 'red',
    'Yellow line': 'yellow',
    'Blue line': 'blue',
    'Blue line branch': 'blue',
    'Green line branch': 'green',
    'Green line': 'green',
    'Rapid Metro': 'silver',
    'Voilet line': 'purple',  
    'Magenta line': 'magenta',
    'Pink line': 'pink',
    'Aqua line': 'skyblue',
    'Gray line': 'gray',
    'Orange line': 'orange'
}

# Create a list of colors corresponding to each line
colors = [line_colors[line] for line in layout_by_line.columns]

# Plot a stacked bar plot with specified colors
layout_by_line.plot(kind='bar', stacked=True, figsize=(10, 6), color=colors)
plt.xlabel('Layout')
plt.ylabel('Count')
plt.title('Layout Distribution by Metro Line')
plt.legend(title='Metro Line', loc='upper left', bbox_to_anchor=(1, 1))
plt.tight_layout()
plt.show()
No description has been provided for this image

Station Layout Analysis¶

In [19]:
layout_counts = df_cleaned['Layout'].value_counts()

# creating the bar plot using Plotly
fig = px.bar(x=layout_counts.index, y=layout_counts.values,
             labels={'x': 'Layout', 'y': 'Number of Stations'},
             title='Distribution of Delhi Metro Station Layouts',
             color=layout_counts.index,
             color_continuous_scale='pastel')

# updating layout for better presentation
fig.update_layout(xaxis_title="Layout",
                  yaxis_title="Number of Stations",
                  coloraxis_showscale=False,
                  template="plotly_white")

fig.show()

MAPPING OUT¶

In [20]:
pip install folium
Defaulting to user installation because normal site-packages is not writeable
Requirement already satisfied: folium in c:\users\rajba\appdata\local\packages\pythonsoftwarefoundation.python.3.12_qbz5n2kfra8p0\localcache\local-packages\python312\site-packages (0.16.0)
Requirement already satisfied: branca>=0.6.0 in c:\users\rajba\appdata\local\packages\pythonsoftwarefoundation.python.3.12_qbz5n2kfra8p0\localcache\local-packages\python312\site-packages (from folium) (0.7.2)
Requirement already satisfied: jinja2>=2.9 in c:\users\rajba\appdata\local\packages\pythonsoftwarefoundation.python.3.12_qbz5n2kfra8p0\localcache\local-packages\python312\site-packages (from folium) (3.1.3)
Requirement already satisfied: numpy in c:\users\rajba\appdata\local\packages\pythonsoftwarefoundation.python.3.12_qbz5n2kfra8p0\localcache\local-packages\python312\site-packages (from folium) (1.26.4)
Requirement already satisfied: requests in c:\users\rajba\appdata\local\packages\pythonsoftwarefoundation.python.3.12_qbz5n2kfra8p0\localcache\local-packages\python312\site-packages (from folium) (2.31.0)
Requirement already satisfied: xyzservices in c:\users\rajba\appdata\local\packages\pythonsoftwarefoundation.python.3.12_qbz5n2kfra8p0\localcache\local-packages\python312\site-packages (from folium) (2024.4.0)
Requirement already satisfied: MarkupSafe>=2.0 in c:\users\rajba\appdata\local\packages\pythonsoftwarefoundation.python.3.12_qbz5n2kfra8p0\localcache\local-packages\python312\site-packages (from jinja2>=2.9->folium) (2.1.5)
Requirement already satisfied: charset-normalizer<4,>=2 in c:\users\rajba\appdata\local\packages\pythonsoftwarefoundation.python.3.12_qbz5n2kfra8p0\localcache\local-packages\python312\site-packages (from requests->folium) (3.3.2)
Requirement already satisfied: idna<4,>=2.5 in c:\users\rajba\appdata\local\packages\pythonsoftwarefoundation.python.3.12_qbz5n2kfra8p0\localcache\local-packages\python312\site-packages (from requests->folium) (3.6)
Requirement already satisfied: urllib3<3,>=1.21.1 in c:\users\rajba\appdata\local\packages\pythonsoftwarefoundation.python.3.12_qbz5n2kfra8p0\localcache\local-packages\python312\site-packages (from requests->folium) (2.2.1)
Requirement already satisfied: certifi>=2017.4.17 in c:\users\rajba\appdata\local\packages\pythonsoftwarefoundation.python.3.12_qbz5n2kfra8p0\localcache\local-packages\python312\site-packages (from requests->folium) (2024.2.2)
Note: you may need to restart the kernel to use updated packages.

LAYOUT ACROSS GEOGRAPHY¶

In [21]:
import folium
m = folium.Map((28.614740,77.207270))


for index,row in df.iterrows():
    if row['Layout']=='Elevated':
        c = 'blue'
    elif row['Layout']=='Underground':
        c='red'
    elif row['Layout']=='At-Grade':
        c='green'
    else:
        c='black'
    
    folium.CircleMarker(
    location=[row['Latitude'],row['Longitude']],
    tooltip=row['Station Names'],
    popup=row['ID (Station ID)'],
    radius=4,
    fill=True,
    color=c,
    opacity=5,
    fill_color=c,
    
    ).add_to(m)
m
Out[21]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Observations:¶

Elevated Stations: Most of the stations are elevated. In metropolitan locations, it is a typical design decision to conserve space and lessen land acquisition concerns.
Underground Stations: In comparison to Elevated stations, there are fewer stations below ground. These are probably in central or heavily inhabited locations, where it is less practical to build above ground.
At-Grade Stations: The scarcity of At-Grade (ground level) stations in the network suggests that they are less common, perhaps because of land and traffic issues.

Summary:¶

Metro Network Analysis involves examining the network of metro systems to understand their structure, efficiency, and effectiveness. Analyzing routes, stations, traffic, connectivity, and other operational factors are usually included.